OTUs selected by >30% of models generated by all OTU models

Using the following bacteria: Blautia (OTU 2), Blautia (OTU 4), Enterococcus (OTU 5), Enterobacteriaceae (OTU 15), Clostridium_XlVb (OTU 30), Prevotellaceae (OTU 81), Coprobacillus (OTU 101), Alloprevotella (OTU 118), Holdemania (OTU 199), Clostridium_XVIII (OTU 200), Robinsoniella (OTU 250)

CFU and Rel abundance plots

How does removing the low samples (Day 8 and CFUs around 0) affect prediction?

Removing these data points improves predictability for day 6 and 8 but not the earlier days. If I increase the cutoff to remove all CFU below 4, there is still no improvement and even in some cases decreased performance.

Is there any abnormalities in the OTUs of any these days?



Re-runnning the program and only focusing on the OTUs predicting day 9/10 to see if there is consistence in the predictive features gives the following selection (including all samples):

Otu00004 Otu00005 Otu00015 Otu00019 Otu00030 Otu00199 Otu00200 Otu00250
14 17 13 15 13 16 17 19

Boruta Confirmed the following OTUs as important for predicting day 9/10 cfu:
Otu00004 Otu00005 Otu00015 Otu00030 Otu00081 Otu00199 Otu00200 Otu00250 Otu00297
21 22 21 21 21 22 22 22 20

Selecting OTUs through collecting the features from the most predictive community/cfu models (R^2 >= 0.6 and MSE <= 0.8), then converting all % Increase in MSE to relative values and taking the median value of of each OTU, then selecting OTUs that fall above the median value results in the following OTUs:
“Otu00001” “Otu00002” “Otu00004” “Otu00005” “Otu00012” “Otu00014” “Otu00015” “Otu00016” “Otu00019” “Otu00028” “Otu00030” “Otu00048” “Otu00081” “Otu00101” “Otu00118” “Otu00199” “Otu00200” “Otu00250” “Otu00297”